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Abstract 

Motivated by the limitation of analyzing oscillatory signals composed of multiple compo¬ 
nents with fast-varying instantaneous frequency, we approach the time-frequency analysis 
problem by optimization. Based on the proposed adaptive harmonic model, the time- 
frequency representation of a signal is obtained by directly minimizing a functional, which 
involves few properties an “ideal time-frequency representation” should satisfy, for exam¬ 
ple, the signal reconstruction and concentrative time frequency representation. FISTA 
(Fast Iterative Shrinkage-Thresholding Algorithm) is applied to achieve an efficient nu¬ 
merical approximation of the functional. We coin the algorithm as Time-frequency bY 
COnvex OptimizatioN (Tycoon). The numerical results confirm the potential of the 
Tycoon algorithm. 

Keywords: Time-frequency analysis, Convex optimization, FISTA, Instantaneous 
frequency, Chirp factor 


1. Introduction 

Extracting proper features from the collected dataset is the first step toward data 
analysis. Take an oscillatory signal as an example. We might ask how many oscilla¬ 
tory components inside the signal, how fast each component oscillates, how strong each 
component is, etc. Traditionally, Fourier transform is commonly applied to answer this 
question. However, it has been well known for a long time that when the signal is not 
composed of harmonic functions, then Fourier transform might not perform correctly. 
Specifically, when the signal satisfies /(f) = J2k=i A k {t) cos(2ncj>k(t)), where K G N, 
Aj-(t) > 0 and <j>' k {t ) > 0 but A k {t) and <t>' k {t) are not constants, the momentary be¬ 
havior of the oscillation cannot be captured by the Fourier transform. A lot of efforts 
have been made in the past few decades to handle this problem. Time-frequency (TF) 
analysis based on different principals m lias attracted a lot of attention in the field and 
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many variations are available. Well known examples include short time Fourier trans¬ 
form (STFT), continuous wavelet transform (CWT), Wigner-Ville distribution (WVD), 
chirplet transform [59j, S-transform 05], etc. 

While these methods are widely applied in many fields, they are well known to be 
limited, again, by the Heisenberg uncertainty principle or the mode mixing problem 
caused by the interference known as the Moire patterns (21] • To alleviate the shortage 
of these analyses, in the past decades several solutions were proposed. For example, 
the empirical mode decomposition (EMD) [3D] was proposed to study the dynamics 
hidden inside an oscillatory signal; however, its mathematical foundation is still lacking 
at this moment and several numerical issues cannot be ignored. Variations of EMD, like 
[HI 02 EH 031123, were proposed to improve EMD. The sparsity approach [281 1261 l27ll47j 
and iterative convolution-filtering [35] US, 12] T3| are another algorithms proposed to 
capture the flavor of the EMD, which have solid mathematical supports. The problem 
could also be discussed via other approaches, like the optimized window approach [44] . 
nonstationary Gabor frame [5|, ridge approach |44j . the approximation theory approach 
El, non-local mean approach [23] and time-varying autoregression and moving average 
approach [18] . to name but a few. Among these approaches, the reassignment technique 
[551 El 151H] and the synchrosqueezing transform (SST) [TB] 1T5. ID] have attracted more and 
more attention in the past few years. The main motivation of the reassignment technique 
is to improve the resolution issue introduced by the Heisenberg principal - the STFT 
coefficients are reallocated in both frequency axis and time axis according to their local 
phase information, which leads to the reassignment technique. The same reassignment 
idea can be applied to a very general settings like Cohen’s class, affine class, etc . 22] , 
SST is a special reassignment technique; in SST, the STFT or CWT coefficients are 
reassigned only on the frequency axis on mm so that the causality is preserved and 
hence a real time algorithm is possible m- The same idea could be applied to different 
TF representation; for example, the SST based on wave packet transform or S-transform 
is recently considered in [51132- 

By carefully examining these methods, we see that there are several requirements a 
time series analysis method for an oscillatory signal should satisfy. First, if the signal is 
composed of several oscillatory components with different frequencies, the method should 
be able to decompose them. Second, if the oscillatory component has time-varying 
frequency or amplitude, then how the frequency or amplitude change should be well 
approximated. Third, if any of the oscillatory component exists only over a finite period, 
the algorithm should provide a clear information about the starting point and ending 
point. Fourth, if we represent the oscillatory behavior in the TF plane, then the TF 
representation should be sharp enough and contain the necessary information. Fifth, the 
algorithm should be robust to noise. Sixth, the analysis should be adaptive to the signal 
we want to analyze. However, not every method could satisfy all these requirements. 
For example, due to the Heisenberg uncertainty principle, the TF representation of the 
STFT is blurred; the EMD is sensitive to noise and is incapable of handling the dynamics 
of the signal indicated in the third requirement. In addition to the above requirements, 
based on the problem we have interest, other features are needed from the TF analysis 
method, and some of them might not be easily fulfilled by the above approaches. 

Among these methods, SST [IB] HUE] an d its variation [3U EH, GET] 02] could simulta¬ 
neously satisfies these requirements, but it still has limitations. While SST could analyze 
oscillatory signals of “slowly varying instantaneous frequency (IF)” well with solid math- 
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ematical supports, the window needs to be carefully chosen if we want to analyze signals 
with fast varying IF [35]. Precisely, the conditions |A' fc (t)| < e^(t) and |^(t)| < ecj)' k (t) 
are essential if we want to study the model f(t) = Y^k=i Ak(t) cos(27r</>fc(f)) by the 
current SST algorithm proposed in dang Eg. Note that these “needs” could be under¬ 
stood/modeled as some suitable constraints, and to analyze the signal and simultaneously 
fulfill the designed constraints, optimization is a natural approach. Thus, in this paper, 
based on previous works and the above requirements, we would consider an optimization 
approach to study the oscillatory signals, which not only satisfies the above requirements, 
but also captures other features. In particular, we focus on capturing the fast varying IF. 
In brief, based on the relationship among the oscillatory components, the reconstruction 
property and the sparsity requirement on the time-frequency representation, we suggest 
to evaluate the optimal TF representation, denoted as F, by optimizing the following 
functional 


n(F,G) 



F(t,u))duj - f(t) 




+ M 


| d t F(t,u) — i2'KUjF{t,u]) + G(t,uj)d ul F(t,oj)\ 2 dtdu} 


+ \\\F\\ l1+1 \\G\\ L 2, 


( 1 ) 


where G is an auxiliary function which quantifies the potentially fast varying instanta¬ 
neous frequency. When G is fixed, it is clear that although T-L(-,G) is not strictly convex, 
it is convex, so finding the minimizer is guaranteed. To solve this optimization prob¬ 
lem, we propose to apply the widely applied and well studied algorithm Fast Iterative 
Shrinkage-Thresholding Algorithm (FISTA). Embedded in an alternating minimization 
approach to estimate G and F, we coin the algorithm as Time-frequency bY COnvex 
OptimizatioN (Tycoon). 

The paper is organized in the following way. In Section [2] we discuss the adaptive 
harmonic model to model the signals with a fast varying instantaneous frequency and its 
identifiability problem; in Section [3] the motivation of the optimization approach based 
on the functional ([T]) is provided; in Section[4j we discuss the numerical details of Tycoon. 
In particular, how to apply the FISTA algorithm to solve the optimization problem; in 
Section [5] numerical results of Tycoon are provided. 


2. Adaptive Harmonic Model 

We start from introducing the model which we use to capture the signal with “fast 
varying IF”. The oscillatory signals with fast varying IF is commonly encountered in 
practice, for example, the chirp signal generated by bird’s song, bat’s vocalization and 
wolf’s howl, the uterine electromyogram signal, the heart rate time series of a subject 
with atrial fibrillation, the gravitational wave and the vibrato in violin play or human 
voice. More examples could be found in [22] . Thus, finding a way to study this kind 
of signal is fundamentally important in data analysis. First, we introduce the following 
model to capture the signals with fast varying IF, which generalizes the A Cl f 2 class 
considered in mm-- 
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Definition 2.1 (Generalized intrinsic mode type function (gIMT)). Fix constants 0 < 
e <C 1 , C2 > Ci > e and C2 > C3 > e. Consider the functional set Q“ 1,c 2,03 , which consists 
of functions in C 1 (M) fl L°°(K) with the following format: 

git) = Aft) cos( 2 t T(f(t)), (2) 

which satisfies the following regularity conditions 

lec'fRjnrfR), </>gc ,3 (k), ( 3 ) 

the boundedness conditions for all t G K. 

inf Alt) > ci, inf (/ft) > Ci, (4) 

teR teR 

sup A(t) < c 2 , sup<//(f) < c 2 , sup \(t>"ft)\ < C3, 

teR teR teR 

and the growth conditions for all t G ffi. 

\A'(t)\<ec/ft), \r (t)| < e</(t). (5) 

Definition 2.2 (Adaptive harmonic model). Fix constants 0 < e <C 1, d > 0 and 
c 2 > Ci > 0. Consider the functional set Qf) C2 ’ C3 , which consists of functions in C 1 (R)n 
L°°(]R) with the following format: 

K 

g(t) =^2ge(t), (6) 

t=i 

where I\ is finite and gift) = A/t) cosfZircfift)) G Of 1 ’ 02,03 ; when K > 1, the following 
separation condition is satisfied: 

<t>e+i(t) - <f/ e (t) > d (7) 


for all £ = 1 ,K — 1. 

We call e, d , Ci, c 2 and C 3 model parameters of the Q r f£ 2 ' C3 model. Clearly, Qf ,C2,C3 C 
Qf'ff 2 ’ 03 and both Q c /’ c2,1=3 and Q/f 2 ’ 03 are not vector spaces. Note that in the A// 2 
model, the condition “fa G C 3 (K), sup tgR \(/fff)\ < c 2 and \(//ft)\ < efafb) for all t G K” 
is replaced by “fa G C 2 (M) and \(/fft)\ < efaft) for all t G R”. Thus, we say that the 
signals in A// 2 are oscillatory with slowly varying instantaneous frequency. Also note 
that A/f 2 is not a subset of Q C // 2 ’ C3 . Indeed, for A/t) cos(27r^(t)) G A// 2 , even if 
<fn G C' 3 (M), the third order derivative of is not controlled. Also note that the number 
of possible components K is controlled by the model parameters; that is, K < C2 ~ Cl . 

Remark. We have some remarks about the model. First, note that it is possible to intro¬ 
duce more constants to control Aft), like 0 < C 4 < inf igR Aft) < sup tgR Aft) < C 5 , in ad¬ 
dition to the control of fa by ci, c 2 > 0 in the model. Also, to capture the “dynamics”, we 
could consider a more general model dealing with the “sudden appearance/disappearance ”, 
like gft) = geft)xi e > where \ is the indicator function and Ie C ffi. is connected and 
long enough. However, while these will not generate fundamental differences but will 
complicate the notation, to simplify the discussion, we stick to our current model. 
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Second, we could consider different models to study the “fast varying IF”. For ex¬ 
ample, we could replace the condition “|^4 , (i)| < e< ft ft), 4>g £ C 3 (R), sup tgR \<ftfft)\ < c-i 
and \<ftft(t)\ < ecftfft) for all t £ R” by the slow evolution chirp conditions W2jj : that 
is “|A , (t)| < eAft)(ft(t), 4>e £ C 2 (R) and \<ftfft)\ < eftfft) 2 for all t £ R”. We re¬ 
fer the reader with interest in the detailed discussion about this “slow evolution chirp 
model” to l22i Section 2.2]. A simplified slow evolution chirp model (with the condition 
|A 7 (t)| < eft ft)) is recently considered in jpg for the study of the sparsity approach to 
TF analysis. We mention that the argument about the identifiability issue stated below 
for Q c ff 2 ’ C3 could be directly applied to state the identifiability issue of the slow evolution 
chirp model. 

Before proceeding to say what it means by “instantaneous frequency” or “amplitude 
modulation”, we immediately encounter a problem which is understood as the identi¬ 
fiability problem. Indeed, we might have infinitely many different ways to represent a 
cosine function g 0 (t) = cos(27rt) in the format aft) cos(27r <j>(t)) so that a > 0 and ft > 0, 
even though it is well known that go ft) is a harmonic function with amplitude 1 and 
frequency 1. Precisely, there exist infinitely many smooth functions a and ft so that 
goft) = cos(2nt) = (1 + aft)) cos(27r (t + /3(t))), and in general there is no reason to favor 
a(t) = /3(t) = 0. Before resolving this issue, we could not take amplitude 1 and frequency 
1 as reliable features to quantify the signal go when we view it as a component in Qf 1,C2,C3 . 
In [5], it is shown that if gft) = Aft) cos(27r </>ft)) = [Aft) + aft )] cos(27r[</(£) + /3(t)]) are 
both in Aft ft 2 , then |a(i)| < Ce and \fi'{t)\ < Ce, where C is a constant depending only 
on the model parameters Ci, C 2 , d. Therefore, Ag and cf)'g are unique locally up to an error 
of order e, and hence we could view them as features of an oscillatory signal in A c ff(f 2 . 
Here, we show a parallel theorem describing the identifiability property for the functions 
in the Qff 2 ' C3 model. 

Theorem 2.1 (Identifiability of Q ^ 1,c2,C3 ). Suppose a gIMT aft) cos <f>(t) £ Q] 1,c2,03 can 
be represented in a different form which is also a gIMT in Q] 1,c 2,03 ; that is, aft) coscfft) = 
Aft) cos ipft) £ Q^ 1 ’C 2 ,c 3 _ Define t m := 4>^ 1 ((m+ 1/2)7r) and s m := (/ _1 (m7r), m £ Z, 
a(t) := Aft) — aft), and fift) := ip(t) — 4>(t). Then we have the following controls of a 
and f3 at t m and s m 

1. Up to a global factor 2ln, l £ Z, fi ft n ) = 0 for all n £ Z; 

2 - aiC'f+'litn) = ^ J 0r al1 n € In P articular > a (^) 

fi'ft n ) = 0 for all n £ Z; 

3. a ( s a )+^( s ) = cos (/3(s n )) for all n £ Z. In particular, a(s m ) 

P{s m ) = 0, m£ Z. 

Furthermore, the size of a and (3 are bounded by 

1. |a(£)| < 2ne for all t £ R; 

2- \]3"ft)\ < 2i re, \fi'ft)\ < ^ and \fi(t)\ < up to a global factor 2ln, l £ Z, for 
all t £ R. 

We mention that the controls of a and /? at t m and s m do not depend on the growth 
condition in ([5|. However, to control the size of a and /?, we need the growth condition 

in (§. 


= 0 if and only if 
= 0 if and only if 
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Theorem 2.2 (Identifiability of Q/f 2 ’ C3 )- Suppose f(t) £ Off/ 2 ’ 03 can be represented 
in a different form which is also in Q/f 2 ’ C3 ; that is, 

N M 

/(f) = ^2 ai(t) cos <f>i(t) = ^2 Ai(t) cos <pi(t) £ Q c /f 2,C3 . (8) 

1=1 1=1 

Then, when d > y^2 In | In C 3 — In e, M = N and for all t £ R. and for all l = 

1 .. .., N, the following holds: 

1. \<pi(t) — <fii(t.) | = 0(i/e) up to a global factor 2mr, n £ Z; 

2 . \m-m\ = 0{y/e); 

3. W(t)-tf(t)\=0(Ve); 

4. \ai(t) - A t (f)| = 0(y/e), 

where the constants on the right hand side are universal constants depending on the model 
parameters of Q C // 2 ’ C3 . 

Note that in this theorem, the bound s/e and the lower bound of d are by no means 
optimal since we consider the case when there are as many components as possible. We 
focus on showing that even when there are different representations of a given function 
in Q Cl / 2 ’ C3 , the quantities we have interest are close up to a negligible constant. As a 
result, we have the following definitions, which generalize the notion of amplitude and 
frequency. 

Definition 2.3. [Phase function, instantaneous frequency, chirp factor and amplitude 
modulation] Take a function f(t) = ag(t) cos (ft (t) £ Q// 2,C3 . For each l = 

1.. .. ,N, the monotonically increasing function is called the phase function of the 

£-th gIMT; the first derivative of the phase function, <f>'/f), is called the instantaneous 
frequency (IF) of the I-th gIMT; the second derivative of the phase function, 4>"{t), is 
called the chirp factor (CF) of the I-th gIMT; the positive function A/t) is called the 
amplitude modulation (AM) of the I-th gIMT. 

Note that the IF and AM are always positive, but usually not constant. On the 
other hand, the CF might be negative and non-constant. Clearly, when <p£ are all linear 
functions with positive slopes and Ag are all positive constants, then the model is reduced 
to the harmonic model and the IF is equivalent to the notion frequency in the ordinary 
Fourier transform sense. The conditions |AJ,(t)| < ecf/t) and \4>"'(t)\ < e<j>' e (t) force 
the signal to locally behave like a harmonic function or a chirp function, and hence the 
nominations. By Theorem |2.1| and Theorem |2.2[ we know that the definition of these 
quantities are unique up to an error of order e. 

We could also model the commonly encountered ingredient in signal processing - the 
shape function, trend and noise as those considered in HUGH- However, to concentrate 
the discussion on the optimization approach to the problem, in this paper we focus only 
on the Q C // 2,C3 functional class. 

3. Optimization Approach 

In general, given a function f(t) = /2k=i Ak{t) cos(27r/>fc(t)) so that Afc(t) > 0 and 
4>' k {t) > 0 for t £ R, we would expect to have the ideal time-frequency representation 
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(iTFR), denoted as Rf(t,cj), satisfying 

K 

Rf(t,u) = £ A fc (t)e^%, (t) M. (9) 

fc=1 

where (*) is the Dirac measure supported at <t>' k (t), so that we could well extract the 
features A k {t) and (l)' k (t) describing the oscillatory signal from Rf. Note that the iTFR 
is a distribution. In addition, the reconstruction and visualization of each component 
are possible. Indeed, we can reconstruct the fc-th component by integrating along the 
frequency axis on the period near (f>' k (t). Indeed, 

A k (t) cos(2n(f>k{t)) = 5ft ^ R f (t,uj)ip dw > ( 10 ) 

where 5ft means taking the real part, 0 <C 1, ip is a compactly supported Schwartz function 
so that ^(0) = 1. Further, the visualization is realized via displaying the “time-varying 
power spectrum” of /, which is defined as 

K 

■= ( n ) 

fc=l 


and we call it the ideal time-varying power spectrum (itvPS) of /, which is again a 
distribution. 

To evaluate the iTFR for a function / = Y^k =l Ak(t) cos(27r</>fc(t)), we fix 0 < 9 <C 1 
and consider the following approximative iTFR with resolution 6 

R f (t,u0 = J2MtV 2 * Mt) -h ^ (12) 

k =1 ' ' 

where t £ R, lo £ R. and h is a Schwartz function supported on [—a, a], a > 0, so that 
J h = 1 and -h (-) converges to Dirac measure <5 supported at 0 weakly as e —> 0 and 
f h(x)dx = 1. Clearly, we know that Rf is essentially supported around (t,<j>' k (t)) for 
k = 1,..., K and as 9 —> 0, Rf converges to the iTFR in the weak sense. Also, we have 
for all 1 6 1 and k = 1,..., K, when 9 is small enough so that o9 > d is satisfied, where 
d is the constant defined in the separation condition in ([7]), we have 

r<f>kM+ ,J 9 

5ft / Rf{t, w)dw = Ak(t) cos(27r</>fc(f)). (13) 

J <j>' k (t)-a9 

Thus, the reconstruction property of iTFR is satisfied. In addition, the visualization 
property of itvPS can be achieved by taking 


Sf(t,uj) = 



(14) 


where the equality holds due to the facts that cj>' k are separated and 6 1. Next we need 

to find other conditions about Rf. A natural one is observing its differentiation. By a 
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direct calculation, we know ^h' = d u \h ^ and hence we have 


dtRfM =j2 A 'k(t)e i2 ^h ( u 

k = l ' ' 

+12*^2 Ak(t)MtW 2 " Mt) ]jh ~ 


k—1 


( u f k{t] 


k =1 


= -£Mt)e‘^h 

k =1 ' ' 


if 


+ i2n 


/ (+\p^^k (*)!/, ^ 


K 


+ d u J2 A k (ty^ct>i(t)\h ( u f {t) ) 


k=1 


(15) 


( 16 ) 


By the fact that ojRf(t,oj) = J2k=i A k {t)u}e' l2 ' KC l >k ^ ^ we have 

d t Rf{t,uj) — i2iru]Rf(t,uj) 

=jtA' t (ty™>h(y*V) 

k =1 ' ' 

- ^ f; A k (t)(w - ^ (t))e» 2 ^wlfe ~ 

k= 1 ' ' 

+ a w f]^We i27r0,t(t) ^W^ • 

fc=i ' ' 

We first discuss the case when / G A^f 2 ', that is, \(j> k {t)\ < e|(^,(f)| for all f e M. 
Note that by the assumption of frequency separation ([TJ) and the fact that 6 -C 1, 
[(j)'i(t) — da, (j)\(t) + da] fl [<j> k (t) — da, <t> k (t) + da] = 0 when l ^ k. Thus we have 


y \ Am 2 y ( u y M y (17) 

k =1 ' ' k—1 ' ' 


Indeed, when w G [<(>((t) — da, + da], we have 


k =1 ' ' 


= kcoi 2 ^ 2 




( 18 ) 
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The same argument holds for the other terms on the right hand side of (161. As a result, 

(19) 


by a direct calculation, for any non-empty finite interval 1 C K, we have 


Vd (d t Rf(t,uj) — i2nujRf(t,uj)^ 


L 2 (/x[0,oo)) 

,2 2 
fc C 2 


< Jo, 0,2 + 27 r#eJi i o ,2 + 4-7T 9 ^2,0,2 d— Jo,1,2 J c 2 I, 

where J n ,m,l ■= J V n [d™h(r])] l dri, where n,m,l = 0,1,.... Thus, when e is small enough, 

\fd (dtRf(t,oj ) — i2nuiRf(t,uj)) is small. Here, we mention that as the dy- 

namic inside the signal we have interest is “momentary”, we would expect to have a small 
error between dtRf(t,co) and i2irujRf(t, ui) “locally”, which however, might accumulate 
when I becomes large. This observation leads to a variational approach discussed in IlSl- 
Precisely, the authors in [15] considered to minimize the following functional when the 
signal / is observed on a non-empty finite interval I: 


Hq(F) := J F(t,cv)duj — f(t) d t 

+ fi JJ \d t F(t, to) — i2nujF(t,uj)l 2 dfdw. 


( 20 ) 


The optimal F would be expected to approximate the iTFR of / € A c y C2,C3 well. How¬ 
ever, that optimization was not numerically carried out in Ha- 

Now we come back to the case we have interest; that is, / £ Q^f 2 ’ 03 . Since the 
condition on the CF terms, that is, |<// fe / (f)| < e\(j)' k (t)\, no longer holds, the above bound 
(19) does not hold and minimizing the functional Rq might not lead to the right solution. 
In this case, however, we still have the following bound by the same argument as that of 

(fl9l): 


VO (d t R f (t,u) - i2nuR f (t,w) - d u ^ A fc (t)e i2 ^VfcW^ ( U 


k =1 


< 


VeJ2 K(t) - i^A k (t){u <j>' k {t)))e a *+M±h f {t) 


k =1 


L 2 {I x [0,oo)) 

( 21 ) 


L 2 (I x [0,oo)) 


< (e 2 Jo, 0,2 + 27re0Jpo,2 + 47r"0 2 J 2 ,o, 2 ) c 2 I, 

Thus, once we find away to express the extra term d u Ylk= 1 A k {t)e l2 ' K ' l,k J^ <j)'{,{t)^h 
in a convenient formula, we could introduce another conditions on F. 

In the special case when K = 1; that is, / = A(t) cos(27rwe know that 

'ui — 4>’{t) ' 

9 


d„ 


A(ty 2 *^ (t>" {t)\h 


9 


= <j>"{i)d u Rf(t,w). 


( 22 ) 


Thus, we have 

9 [I \d t Rf(t,uj) — i2TTUjRf(t,ijj) + (l)"(t)d ul Rf(t,u})\ 2 dtdu} = 0(9 2 ,9e,e 2 ). 


(23) 
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Thus, we could consider the following functional 

9 JJ \d t F{t,uj) — i2nujF(t,u}) + a(t)9 w F(f, w)| 2 dtdu;, (24) 

where a(t) £ K is used to capture the CF term associated with the “fast varying instan¬ 
taneous frequency”. Thus, when K = 1, we can capture more general oscillatory signals 
by considering the following functional when the signal is observed on a non-empty finite 
interval I Cl: 

"H(F, a) := J $1 J F(t, ui)dui — f(t) d t 

+ fid JJ \d t F(t,uj) — i2nujF(t,uj) + a(t)d u F{t,uj)\ 2 dtduj (25) 

+ A||F1|li(/ x r) + 7lHU 2 (/xR)) 

where F £ L 2 (/ xR) is the function defined on the TF plane restricted on JxR. Note 
that the L 1 norm is another constraint we introduce in order to enhance the sharpness of 
the TF representation. Indeed, we would expect to introduce a sparse TF representation 
when the signal is composed of several gIMT. 

In general when I\ > 1, we cannot link d u Ak(t)e l2n ^ k ^<j)'l(t)^h to 

1 9uRf(t,u>) by any function on t like that in (22). In this case, we could expect to find 
another function G £ L 2 (/ xR) so that 


G(f , when w ftkit) + e<J \ 

0 otherwise. 


(26) 


and hence G{t,co)d u Rf{t, w) = d w ■^k(t)e l2n< ^ k ^ ^ . Thus, we could 

consider minimizing the following functional for a given function / observed on a non¬ 
empty finite interval IcR: 


di 


H(F,G) := J 5R J F(t,w)du> - f(t) 

+ fid JJ \d t F(t,uj) — i2'KUjF{t 1 bj) + G(t,aj)9 w F(t,w)| 2 dtdw (27) 

+ ^li^lUq/xR) + JJ\\G\\ L 2 ( i xR) . 

Here, the L 2 penalty term ||G||l 2 has 1 /Vd in front of it since 


K 


|G|| 


L 2 (Ix [0,oo)) 


= V29aJ2 || <t>\ 


fcl|l/ 2 (/x[0,Oo))’ 


(28) 


k=1 


Thus, the L 2 penalty term does not depend on d. It is also clear that the L 1 penalty 
term in the above functional does not depend on 9 as we have 


J Jl k=i ' 


K 


dtdw = E \\Ak{t)\\Li(I)- 


(29) 


fc=i 
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4. Numerical Algorithm 


We consider the following functionals associated with (251: 


T~L{F,a) = [ f F(t,u>)dio — /(f) 
Jr Jr 


d t 


(30) 


+/ J J | d t F(t,u>) - + a(t)d u F(t,u})\ z dtdco + (1 - A)||.F|| l i ) + tII^IIz, 2 

= G(F, a) + 4'( F,a ), 


where 


Q(F,a):=f ft [ F(t,w)du-f(t) 
Jr Jr 


d t 


(31) 


+ jlX // \dtF(t,cj) — i2 / KUjF(t^uj) + a{t)d^F{t,uj)\ dtdoj, 


*(F,a) :=A(l-A)||P’|Ui+ 7 |H|i a , 


(32) 


t is the time and u> is the frequency. The numerical implementation of (27) follows the 


same lines while we have to discretize a two dimensional function G. Compared to (251, 
we have redefined the role of the hyperparameter fi and A. Here, ft £ M+ balance between 
the data fidelity term f R 9? f R F(t,w)duj — /(f) | dt allowing the reconstruction, and the 
regularization term which controls the variation on the derivatives, and the sparsity 
of the solution. The parameter A £ [0,1] allows one to balance between the sparsity 
prior and the constraint of the derivatives. This choice will simplify the choice of the 
regularization parameters. Clearly, by setting n = fj,\ and A = /.(! — A) we recover the 


original formulation (251. 


4-1. Numerical discretization 

Numerically, we consider the following discretization of F by taking A t > 0 and 
> 0 as the sampling periods in the time axis and frequency axis. We also restrict 
F to time [0,MA t ] and to the frequencies [— NA u , NA^]. Then, we discretize F as 
F £ C (N+1)X(M+1) an j a ag a g JM+/ where 

Fn.m — F (t rrl , ej n ), a m = o:(t m ), (^^) 

t m := mA t , io n ■= nAu, n = —N ,..., N and m = 0,1,..., M. The observed signal /(f) 
is discretized as a (M + l)-dim vector /, where 

fi = f(U )• (34) 

Note that the sampling period of the signal A t and M most of time are determined by 
the data collection procedure. We could set A w = and N = [M/2] suggested by 
the Nyquist rate in the sampling theory. 
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Next, using the rectangle method, we could discretize Q{F,a) directly by 


G(F,a) 


M 


: =E 

771—0 


N 

Y 23?(F(t m , Wn ))A w 

n=—N 
M N 


f(tn 


A t 


(35) 


T p 'y ' y ' | dtF(t rn ,(jj n ') i2TTUJ rl F(t rn , co n j “b oc(t m ')dcjF(t rn , cu n )| AtA u 

m—0 n——N 


The partial derivative d t F can be implemented by the straight finite difference; that is, 
take a (M + 1) x (M+ 1) finite difference matrix Dm +i so that FDm+ i approximates the 
discretization of dtF. However, this choice may lead to numerical instability. Instead, 
one can implement the partial derivative in the Fourier domain, using that dtF(t m ,L 0 n ) = 
[m], where F = F(F) and F denotes the finite Fourier transform. 
For the sake of simplicity, we still denote by dt or d u the discretization operator in the 
discret domain, whatever the chosen method (finite difference or in the Fourier domain). 
Also denote 1 = (1,..., 1) T G R M+1 . In the matrix form, the functional G(F, a) is thus 
discretized as 


G(F, a) = A t || AF - F\\ 2 + A t A u p || B(F, a)|| 2 , 


where 


^4 ; (^(M + l) X (M+l) y j^M+1 

F H> 2SR(l T F)A bJ , 


(36) 


(37) 


. (j^(M+l)x(M+l) x f^M +1 ^(iV+l)x(M+l) 

(F, a) i->- d t F — i2TrijjF + 3 w Fdiag(o!), 

and w = diag(-AA w ,..., 0, A w , 2A W ,..., NA U ) G r(m+i)x(m+i). 

4-2. Expression of the gradient operator 

Denote G a (F) := F Q{F,a) and B a (F) :=Fg B(F,a ); that is, a is fixed. 
Similarly, define Gp(a) := a Q(F , a) and Bf(ol) := a i— > B(F , a); that is, F is fixed. 
We will evaluate the gradient of Ga and Gf after discretization for the gradient decent 
algorithm. Take G G Cd M+1 ) x ( M+1 ). The gradient of Ga after discretization is evaluated 

by 


VG a \ F G 


Ga(F + hG) - Ga(F) 
h—>-0 h 

2A t (AF - f) T AG + 2A t A u p(B a F, B a G) 
(2A t A* (AF -f) + 2A t A u pB* a B a F, G). 


(39) 


As a result, we have 


\7GaW = 2A t A*(AF -f) + 2A t A UJ pB* a B a F. 
12 


(40) 



where A* and B* a are adjoint operators of A and B a respectively. Now we expand A* 
and B* a . Take g € K M+1 . We have 


(AF,g) 


M f N \ 

E E 2 ® F n,mA u g rn 


m —0 \n——N / 

M N 

E E 23fJF njm 3?(A w g m ), 

m— 0 n=—N 


(41) 


and 


M N 


{F, A*g) = E E F n,m(A*g) n , m (42) 

m—0 n——N 

M N M N 

= E E g)n,m + ^ ^ g)n,m 

m—0 n=—N m—0 n=—N 

M N M N 

+ *E E % F n, m mA*g) n ,m - i E E ^n, m 9M*fl)n,m- 


9 1 ■ 

• • Sm+i 

,91 ■ 

• ■ 9m+ i 


m—0 n——N m—0 n——N 

Since (AF,g) = (F,A*g) for all F and g, we conclude that 

(j-<(M+l)x(M+l) 

g i—>• 2 A l 

To calculate B* a , by a direct calculation we have 

(B a F, G) = {.d t F - VI-kujF + d w Fdiag(a), G) 

= (. F , -dtG + i2nujG - d u Gdiag(a)) 

= (F, B* a G ), 

where G € C^ M+1 ^ X ^ M+1 \ Thus, we conclude that 

2 ^* . ^(M+l)x(M+l) y (^(M+l)x(M+l) 

G i->- -~d t G + i2nu:G — d UJ Gdiag(a). 


(43) 


(44) 


(45) 


As a result, the first part of 2A t A*(AF — /), can be numerically expressed as 


/ n n \ 

I AE Fn,i f i • • • E J’n.M+r - /m+i ' 

n—0 n =0 


4 A* A. 


JV N 

E Fn,l -f 1 ... E ^n,M+l - / 

v n—0 n=0 




m+i y 
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and the second term 


2A t A u nB*BF = 2A t A u n (-d t d t F + i^ujd t F - d t d u Fdiag{a) + 4t t 2 uj 2 F (47) 

+i2TTuid LU Fdi&g(ct) — d u d t Fdiag(a ) + i27rd w W-Fdiag(a) — d u d u Fdi&g(a)). 

Similarly, by taking j3 £ C M+1 , the gradient of Qp at a after discretization is evalu¬ 
ated by 


vg F \ a f3 


j . Gf{ol + h/3) — Gf(oc) 
h — h 
i^((d UJ Fdiag(P))*(d u Fdiag(a)) 
-f3*tT(F*d bJ d UJ Fdiag(a)). 


Thus, we have 


VGf |« = —tr(F*d UJ d UJ Fdiag(a)) £ C M+1 , 


where 

N 

(yQF\a)m= Oi(tm) ^ [9 u F(t m , W„)] 2 . 

n——N 


(48) 


(49) 


(50) 


4-3. Minimize the functional / H(F, a) 

We now have all the results needed to propose an optimization algorithm to minimize 
the functional T-L(F, a). The minimization of TL(F,G) in (27) is the same so we skip 


it. The functional we would like to minimize depends on two terms, F and a. While 
the PALM algorithm studied in [Bj provides a simple procedure to minimize ( p5| , this 
algorithm appeared to be too slow in practice for this problem. Since the functional 
spaces F and a live are convex, we will therefore minimize the functional alternately by 
optimizing one of these two terms when the other one is fixed; that is, 


F k + 1 = argmin’H(F,a fc ) 

F 

a k+ i = arg min % (Tfc+i, a). 


(51) 


with ao = 0 and Fq = 0 are used to initialize the algorithm. A discussion on convergence 
results of this classical Gauss-Seidel method can be found in [B]. 

As we will see in next subsections, if we can reach the global minimizer of a £ 
H(F k+ i,a), finding a minimizer of F i-> 'H(F,ak) requires the use of an iterative algo¬ 
rithm. We provide in |Appendix C a convergence result of the practical algorithm we 
propose. 


4-4- Minimization of T-L a := %{■, a) 

When a is fixed, T~L a is a convex non smooth functional, involving a convex and 
Lipschitz differentiable term (the function Q a := C/(-,a)), and a convex but non-smooth 
term (the Tq := ^(^a) regularizer). Popular proximal algorithms such as forward- 
bacward [2] or the Fast Iterative Shrinkage/Thresholding Algorithm (FISTA) |2[7] can 
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then be employed. FISTA has the great advantage to reach the optimal rate of con¬ 
vergence; that is, if F is the convergence point, 1~L a {F k ) — 7i a (F) = O (iy), while the 
forward-backward procedure converge in O (!) (see |48] for a great review of proximal 
methods and their acceleration). This speed of convergence is usually observed in prac¬ 
tice |38j , and has been confirmed in our experiments (not shown in this paper). Contrary 
to the forward-backward, one limitation of the original FISTA [5] is that the convergence 
is proven only on the sequence (' H a {F k )) k rather than on the iterates (F k ) k . However, 
the latest study [7] gives a version of FISTA, which fills in this gap while maintaining 
the same convergence rate. As far as we know, it is the only algorithm with these two 
properties, and then will be use in the following. Yet another shortcoming of the original 
FISTA is that the algorithm does not produce a monotonic decreasing of the functional, 
but a monotonic version is available [4] and is used in this paper. 

In short, FISTA relies on three steps 


1. A gradient descent step on the smooth term Q a ] 

2. A soft-shrinkage operation, known as the proximal step; 

3. A relaxation step. 


The algorithm is summarized in Algorithm [l] In practice, the Lipscliitz constant can be 
evaluated using a classical power iteration procedure, or using a backtracking step inside 
the algorithm (see [5] for details). VG a is given by Eq. (46) and Eq. (47). 

Moreover, when the signal / is real and a is real, we can limit the optimization to 
the positive frequencies such that F £ q(n+i)x(m+i) ^ N = \M/ 2], Indeed, one 

can show that there exists a solution F which has an Hermitian symmetry properties, 
i.e. such that F{t,u>) = F(t, —uj). In order to prove this result, we remark that we have 


V<7 q |f(£, -w) = VC/qIf (t, -w), 


(52) 


which can be easily checked thanks to Eq. (46) and (47). Then, if F 0 is Hermitian 
symetric, one can prove by induction that at each iteration, F k is Hermitian symmetric. 


4-5. Minimization of Up := TL(F, •) 

Once F k is estimated, the minimization of H.F k reduces to a simple quadratic mini¬ 
mization: 


M N 


atk+i = argrnin < n ^ \d t F(t m , w n ) - i2-Ku n F{t m ,u> n ) + a(t m )d LJ F(t m ,u} n )\'‘ 


m =0 n =0 


M 


+7 \ a ( tri 


(53) 


m—0 ) 

Thus, a can be estimated in a closed form as, for all m = 0,..., M, 

2 E ®(du F {tm,Un)[dtF(t m ,U n )-i2™nF(tm,Un)]) 

a k+ l(tm) = ^- 

E \duF{t m ,u] n )\ 2 +7//X 

n =0 


(54) 
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Algorithm 1 FISTA algorithm for T~L a : F = FISTA(i r 'o, a, e) 
Choose a stopping value e. 

The initial values are F 0 £ C ( ~ N+1 ^ M+1 '>, z 0 = F 0 

Evaluate the Lipschitz constant L = ||VC? Q || 2 by power iterations. 

while > e do 

Gradient step: F k+ 1/ 2 £- z k - fVt/ a | 2)c (see ([46]) and ©); 

Proximal step: F k+1/2 <- F k+1/2 (l - , F ^f /2 | ) 5 
Monotonic step: 

if H{F k+ 1 / 2 ,a) < H(F k ,a) then 

Fk+l = Fk+1/2 

else 

F k +i = F k 

end if 


Relaxation step: z k+1 +- F k+1 + k + 2 {F k+1 - F k ) 


k = k + 1 ; 

end while 

Output F. 


fc+1 ( TT* * 
fc + 2 ^ fe + 1/2 


- n); 


Algorithm 2 Algorithm for minimization of T~L 

Choose a stopping value ei for the FISTA algorithm; 

Choose a stopping value e 2 for the alternating minimization; 
Choose a set of decreasing values for the parameter jl £ R + . 
Choose the parameters A £ [0,1] and 7 £ R+; 

The initial values are k = 0, F 0 = 0 , ao = 0 ; 
for p, £ Ip do 

while > ei do 

FISTA step: F k+1 = FISTA (F k ,a k ,t{) (see Alg. [I]): 
alpha estimation step (see Eq. ([54])) ; 
k = k 1 ; 

end while 
end for 

Output F, a ; 


4-6. General algorithm 

We summarize in Algorithm [ 2 ] the practical procedure to minimize H (25). The 
choices of the parameters are discussed below. 


• Stopping criterion. As the functional F 1 —> H(F,a) is convex, a good stopping 
criterion for FISTA is the so-called duality gap. However, the duality gap cannot 
be computed easily here. We then choose the classical quantity, t° st°P 

the FISTA inner loop as well as the alternating algorithm; that is, the algorithm 
stops when both the stopping criteria, ^+^ 7 ^ < and < £2 for the 

chosen ei, e 2 > 0, are satisfied, ei and e 2 can be set to 5 x 10 -4 in practice: smaller 
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value produce a much slower algorithm for similar results. 

• Set of values I A. A practical choice of Ip is a set of I\ values uniformly distributed 
on the logarithmic scale. In the noise free case, one must choose a sufficiently small 
p. However, a small value of p gives a very slow algorithm. A practical strategy 
is to use a fixed point continuation [25] . also known as warm start, strategy to 
minimize B. If the noise is taken into account, the final p cannot be known in 
advance, but can be chosen to be the one leading to the best result among the K 
obtained minimizers. Here, we choose p according to the discrepancy principle m- 
Another approach could be the GSURE approach [T9] (not derived in this work). 

• Parameter A. This parameter must be chosen between 0 and 1. The closer A is 
to 1 , the more importance is given to the constraints on the derivatives. As these 
constraints should be satisfied as much as possible, we choose in practice A ~ 0.99. 

• Parameter 7 . The influence of this parameter is not dominant on the results. We 
set 7 ~ 10 -3 in order to prevent any division by 0 during the estimation of a by 

@. 

• Initialization of the algorithm. The choice of a = 0 appears to be natural, 
as we cannot have access to the chirp factor. The first iteration of Algorithm [2] is 
equivalent to an estimation without taking this chirp factor into account. However, 
this initialization can have some influence on the speed of the algorithm [5j. As the 
solution is expected to be sparse, F = 0 seems to be a reasonable choice. 


5. Numerical Results 

In this section we show numerical simulation results of the proposed algorithm. The 
code and simulated data are available via request. In this section, we take W to be the 
standard Brownian motion defined on [0,oo) and define a smoothed Brownian motion 
with bandwidth a > 0 as 


<f> a :=W*K a , (55) 

where K a is the Gaussian function with the standard deviation <7 > 0 and * denotes the 
convolution operator. 

5.1. Single component, noise-free 

The first example is a semi-real example which is inspired from a medical challenge. 
Atrial fibrillation (Af) is a pathological condition associated with high mortality and 
morbidity [321 . It is well known that the subject with Af would have irregularly irregular 
heart beats. In the language under our framework, the instantaneous frequency of the 
electrocardiogram signal recorded from an Af patient varies fast. To study this kind of 
signal with fast varying instantaneous frequency, we pick a patient with Af and determine 
its instantaneous heart rate by evaluating its R peak to R peak intervals. Precisely, if the 
R peaks are located on t,;, we generate a non-uniform sampling of the instantaneous heart 
rate and denote it as {ti, l/(ti+i — tf)). Then the instantaneous heart rate, denoted as 
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< j)\(t ), is approximated by the cubic spline interpolation. Next, define another a random 
process A 1 on [0, L\ by 


A-i (t) — 1 + 


ft) + II^<tiI1l°°[0,L] 


(56) 


where t G [0, L] and o\ > 0. Note that A\ is a positive random process and in general 
there is no close form expression of A\ ( t ) and <pi ( t). The dynamic of both components 
can be visually seen from the signal. We then generate an oscillatory signal with fast 
varying instantaneous frequency 


Aft) = Ai(t) cos(27n/>i(t)), 


(57) 


where A\(t) is a realization of the random process defined in (56). We take L = 80, 
sample A with the sampling rate At = 1/10, ay = 100, 02 = 200. To compare the 
result with other methods, in addition to showing the result of the proposed algorithm, 
we also show the analysis results of STFT and synchrosqueezed STFT. In the STFT and 
synchrosqueezed STFT, we take the window function g as a Gaussian function with the 
standard deviation a = 1. See Figure [l] for the result. We mention that in this section, 
when we plot the tvPS, we compress its dynamical range by the following procedure. 


Denoted the discretized tvPS as 91 € 


where m. n G N stand for the number of 


discrete frequencies and the number of time samples, respectively. Set M to be the 
99.9% quantile of the absolute values of all entries of 91, then normalize the discretized 


tvPS by M, and obtain 91 G 


so that := max{M, 9\(i, j)} for i = 1 ,..., m 


and j = 1,... ,n. Then plot a gray-scale visualization of 91 in the linear scale. From 
the figure, we see that the proposed algorithm Tycoon could extract this kind of fast 
varying IF well visually. However, although there are some periods where STFT and 
synchrosqueezed STFT show a dominant curve following the IF well, in general the IF 
information is blurred in their TF representations. In addition, by Tycoon, the chirp 
factor can be approximated up to some extent. 


5.2. Two components, noise-free 

In the second example, we consider an oscillatory signal with two gIMTs. Define 
random processes A 2 (t) and </> 2 (t) on [0,L] by 


A 2 (t) — 1 + 

(j) 2 {t) =7rt- 


^(i) 


■ 211 <E>, 


<7 1 ||l oo [o,.l] 


3|| < 1 > ( t 1 ||l°°[o,l] 

rt r $ CT2 (s) + 0.5||$ 


(58) 


o-2l|i“[0,L] 


1-5||4> 0 . 2 IU o °[o,l] 


— sin(s) 


ds, 


where t € [0, L] and a 2 > 0. Note that by definition (f> 2 are both monotonically increasing 
random processes. The signal is constructed as 


f{t) = h(t) + f 2 (t), (59) 

where f 2 (t) = A 2 (t) cos(27r</>2(i))X[20,80](0 an d X is the indicator function. Again, we 
take ay = 100, a 2 = 200, L = 80 and sample / with the sampling rate At = 1/10. The 
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Figure 1: Top: the signal f± is shown as the gray curve with the instantaneous frequency superimposed 
as the black curve. It is clear that the instantaneous frequency varies fast. In the second row, the 
short time Fourier transform with the Gaussian window with the standard deviation 1 is shown on the 
left and the synchrosqueezed short time Fourier transform is shown on the right. In the third row, the 
Tycoon result is shown on the left and our result with the instantaneous frequency superimposed as a 
red curve is shown on the right. In the bottom, the chirp factor, (j )'^(£), is shown as the gray curve and 
the estimated ^(t); that is, the a(t), is properly normalized and superimposed as the black curve. In 
the top and bottom figures, for the sake of visibility, only the first part of the signal is demonstrated. 


result is shown in Figure [2j For the comparison purpose, we also show results from other 
TF analysis methods. In STFT and synchrosqueezed STFT, the window function is the 
same as that in the first example - the Gaussian window with the standard deviation 
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a = 1. We also show the result with the synclirosqueezed CWT maim, where the mother 

1 

wavelet if is chosen to satisfy if(^) = e ( 0.2 ) “ _1 X[o.8,i.2] > where x is the indicator function. 
Further, the popular empirical mode decomposition algorithm combined with the Hilbert 
spectrum (EMD-HS) [30] is also evaluated. The tvPS of / determined by EMD-HS is via 
the following steps. First, we run the proposed sifting process and decompose the given 
signal / into K components and the remainder term (see [3D] for details of the sifting 
process); that is, f(t) = J2^=i x k(t) + r(t ), where G N is chosen by the user, x k is 
the fc-th decomposed oscillatory component and r is the remainder term. The IF and 
AM of the /c-th oscillatory component is determined by the Hilbert transform; that is, 
by Xk(t) = Xk(t) + iH(xk{t)) = &fc(i)e* 27r ^ fc ^, where TL is the Hilbert transform, the IF 
and the AM of the fc-th oscillatory component are estimated by if' k (t) and b k (t). Here 
we assume that x k is well-behaved so that the Hilbert transform works. Finally, the 
tvPS (or called the Hilbert spectrum in the literature) of the signal / determined by 
the EMD-HS, denoted as S)f, is set to be Sjf(t,w) = J2k=i bk(t)S(u — if) k (t)(t)). In this 
work, due to the well-known mode-mixing issue of EMD and the number of components 
is not known a priori, we choose K^ = 6 so that we could hope to capture all needed 
information. We mention that one possible approach to evaluate the IF and AM after the 
sifting process is applying the SST directly to x k (t)\ this combination has been shown 
useful in the strong field atomic physics [45]. The results of STFT, synclirosqueezed 
STFT, synchrosqueezing CWT and EMD-HS are shown in Figure [3] Visually, it is 
clear that the proposed convex optimization approach, Tycoon, provides the dynamical 
information hidden inside the signal /, since the IFs of both components are better 
extracted in Tycoon, while several visually obvious artifacts could not be ignored in 
other TF analyses. For example, although we could see the overall pattern of the IF of 
f 2 in the STFT, the interfering pattern could not be ignored. While the IF of f 2 could 
be well captured in synclirosqueezed CWT, the IF of /1 is blurred; on the other hand, 
while the IF of fi could be well captured in EMD-HS, the IF of f 2 is blurred. Clearly, 
the IF patterns of both components could not be easily identified in the synclirosqueezed 
STFT. 


5.3. Performance quantification 


To further quantify the performance of Tycoon, we consider the following metric. As 
indicated above, we would expect to recover the itvPS. Thus, to evaluate the performance 
of Tycoon and have a comparison with other TF analyses, we would compare the time 
varying power spectrum (tvPS) determined by different TF analyses with the itvPS of 
the clean simulated signal s. If we view both the itvPS and the tvPS as distributions 
on the TF-plane, we could apply the Optimal Transport (OT) distance, which is also 
well known as the Earth Mover distance (EMD), to evaluate how different the obtained 
tvPS is from the itvPS m- We would refer the reader to [49l section 2.2] for its detail 
theory. Here we quickly summarize how it works. Given two probability measures on 
the same set, the OT-distance evaluate the amount of “work” needed to “deform” one 
into the other. Precisely, the OT-distance between two probability distributions /i and v 
on a metric space (S, d ) involves an optimization over all possible probability measures 
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on S x S that have p and v as marginals, denoted as V(p, v), by 


doT(n,v)'-= inf / d(x,y)dp(x,y) , (60) 

which in the one-dimensional case, that is, when ScK, and d is the canonical Euclidean 
distance, d(x,y) = \x — y\, could be easily evaluated. Define f^(x) = f_ dp and 
f v (x) = di/, the OT distance is reduced to the L 1 difference of and /„; that is, 


doT{P,v)=J -f„(x)\dx . (61) 

In the TF representation, as tvPS is always non-negative, we could view the distribution 
of the tvPS at each time as a probability density after normalizing its L 1 to 1. This 
distribution indicates how accurate the TF analyses recover the oscillatory behavior of 
the signal at each time. Thus, based on the OT distance, we consider the following 3) 
metric to evaluate the performance of each TF analyses of the function / by 


£> := 100 x 


doT(Pj,Pf) dt, 


(62) 


where P}{u) := P}(u) := S f (t,u) is the itvPS and S f (t,u) 

is the estimated tvPS by a chosen TF analysis. Clearly, the small the 3) metric is, the 
better the itvPS is approximated. 

To evaluate the second example, we run STFT, synchrosqueezed STFT, synchrosqueez¬ 
ing CWT and Tycoon on 100 different realizations of / 2 in (59), and evaluate the 3) 
metric. The result is displayed in (mean ± standard deviation). The 3> metric between 
the itvPS and the tvPS determined by Tycoon (respectively, EMD-HS, STFT, syn¬ 
chrosqueezed STFT and synchrosqeezed CWT) is 6.06 ± 0.25 (respectively, 7.18 ± 0.93, 
8.76 ± 0.41, 8.13 ± 0.42 and 7.36 ± 0.67). Further, under the null hypothesis that there 
is no performance difference between the tvPS determined by Tycoon and STFT eval¬ 
uated by the 3) metric and we set the significant level at 5%, the t-test rejects the null 
hypothesis with the p-value less than 10 -8 . The same hypothesis testing results hold for 
the comparison between Tycoon and other methods. Note that while the performance 
of Tycoon seems better than EMD-HS, the 3) metric only reflects partial information 
regarding the difference and more details should be taken into account to achieve a fair 
comparison. For example, if we set K# = 2, the 3) metric between the itvPS and the 
tvPS determined by EMD-HS becomes 4.98 ± 0.81, which might suggest that EMD-HS 
performs better. However, this “better performance” is not surprising since the sparsity 
property is perfectly satisfies in EMD-HS, which is inherited in the procedure, while the 
mode mixing issue might lead to wrong interpretation eventually. Note that it is also 
possible to post-process the outcome of the sifting process to enhance the result, but 
these ad-hoc post-processing again are not mathematically well supported. Since it is 
out of the scope of this paper, we would leave this overall comparison between different 
TF analyses based on different philosophy as well as a better metric to the future work. 
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Figure 2: Top: the signal / is shown as the gray curve with fa superimposed as the black curve which 
is shifted up by 4 to increase the visualization. It is clear that the instantaneous frequency (IF) also 
varies fast in both components. In the bottom row, the intensity of the time frequency representation, 
|fj/| 2 , determined by the proposed Tycoon algorithm is shown on the left; on the right hand side, the 
instantaneous frequencies associated with the two components are superimposed on | Rf | 2 as a red curve 
and a blue curve. 


5-4■ Two component , noisy 

In the third example, we add noise to the signal / and see how the proposed algorithm 
performs. To model the noise, we define the signal to noise ratio (SNR) as 


SNR := 20 log 10 


std(/) 

std($)’ 


(63) 


where / is the clean signal, $ is the added noise and std means the standard deviation. 
In this simulation, we add the Gaussian white noise with SNR 7.25 to the clean signal 
/, and obtain a noisy signal Y. The result is shown in Figure [4] Clearly, we see that 
even when noise exists, the algorithm provides a reasonable result. To further evaluate 
the performance, we run STFT, synchrosqueezed STFT, synchrosqueezing CWT and 


Tycoon on 100 different realizations of fa in (59) as well as 100 different realizations 
of noise, and evaluate the X> metric. Here we use the same parameters as those in 
the second example to run STFT, synchrosqueezed STFT and synchrosqueezed CWT. 
Since it is well known that EMD is not robust to noise, we replace the sifting process 
in EMD by that of the ensemble EMD (EEMD) to decompose the signal into K ^ = 6 
oscillatory components, and generate the tvPS by the Hilbert transform as that in EMD. 
We call the method EEMD-HS. See [ST] for the detail of the EEMD algorithm. The S 
metric between the itvPS and the tvPS determined by Tycoon (respectively, EEMD-HS, 
STFT, synchrosqueezed STFT and synchrosqeezed CWT) is 11.87 ± 0.74 (respectively, 
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11.65 ± 0.63, 14.53 ± 0.55, 14.09 ± 0.58 and 12.79 ± 0.69). The same hypothesis testing 
shows the significant difference between the performance of Tycoon and that of STFT, 
synclirosqueezed STFT and synchrosqeezed CWT, while there is no significant difference 
between the performance of Tycoon and that of EEMD-HS. Again, the same comments 
for the comparison between Tycoon and EMD-HS carry here when we compare Tycoon 
and EEMD-HS, and we leave the details to the future work. 


6. Discussion and future work 


In this paper we propose a generalized intrinsic mode functions and adaptive harmonic 
model to model oscillatory functions with fast varying instantaneous frequency. A convex 
optimization approach to find the time-frequency representation, referred to as Tycoon 
algorithm, is proposed. While the numerical results are encouraging, there are several 
things we should discuss. 


1 . 


2 . 

3. 

4. 


While with the help of FISTA the optimization process can be carried out, it is 
still not numerically efficient enough for practical usage. For example, it takes 
about 3 minutes to finish analyzing a time series with 512 points in the laptop, 
but in many problems the data length is of order 10 5 or longer. Finding a more 
efficient strategy to carry out the optimization is an important future work. One 
possible solution is by the sliding window idea. For a given long time series / of 
length n and a length m < n, we could run the optimization consecutively on the 
subinterval Ij := [j — m, j + m] to determine the tvPS at time j. Thus, the overall 
computational complexity could be 0(E(m)n), where F(m) is the complexity of 
running the optimization on the subinterval Ij. 

When there are more than one oscillatory component, we could consider (271 to 
improve the result. However, in practice it does not significantly improve the result. 
Since it is of its own interest, we decide to leave it to the future work. 

While the Tycoon algorithm is not very much sensitive to the choice of parameters 
/r, A and 7 , how to choose an optimal set of parameters is left unanswered in the 
current paper. 

The noise behavior and influence on the Tycoon algorithm is not clear at this 
moment, although we could see that it is robust to the existence of noise in the 
numerical section. Theoretically studying the noise influence on the algorithm is 
important for us to better understand what we see in practice. 


Before closing the paper, we would like to indicate an interesting finding about SST 
which is related to our current study. When an oscillatory signal is composed of intrinsic 
mode type function with slowly varying IF, it has been studied that the time-frequency 
representation of a function depends “weakly” on a chosen window, when the window has 
a small support in the Fourier domain DUE!. Precisely, the result depends only on the 
first three absolute moments of the chosen window and its derivative, but not depends on 
the profile of the window itself. However, the situation is different when we consider an 
oscillatory signal composed of gIMT function with fast varying IF. As we have shown in 
Figure [2j when the window is chosen to have a small support in the Fourier domain, the 
STFT and synchrosqueezed STFT results are not ideal. Nevertheless, nothing prevents 
us from trying a window with a small support in the time domain; that is, a wide support 
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in the Fourier domain. As is shown in Figure [5] by taking the window to be a Gaussian 
function with the standard deviation 0.4, STFT and synchrosqueezed STFT provide 
reasonable results for the signal / considered in ( [59] ) . Note that while we could start to see 
the dynamics in both STFT and synchrosqueezed STFT, the overall performance is not 
as good as that provided by Tycoon. Since it is not the focus of the current paper, we just 
indicate the possibility of achieving a better time-frequency representation by choosing 
a suitable window in SST, but not make effort to determine the optimal window. This 
kind of approach has been applied to the strong field atomic physics H5 S 35], where the 
window is manually but carefully chosen to extract the physically meaningful dynamics. 
A theoretical study regarding this topic will be reported in the near future. 
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Figure 3: The time frequency (TF) representations® different TF analyses on the signal /. In the first 
row, on the left, the short time Fourier transform (STFT) with a Gaussian window with the standard 
deviation a = 1 is shown, and on the right the IF’s of both components are superimposed for the visual 
comparison. In the second row, on the left, the synchrosqueezed STFT with a Gaussian window with 
the standard deviation a = 1 is shown, and on the right the IF’s of both components are superimposed 

for the visual comparison. In the third row, on the left, we show the synchrosqueezed continuous wavelet 

i 

transform with the mother wavelet IP so that i/j(£) = e ( ' 0.2 1 _1 X[o. 8 , 1 . 2 ] > where \ ' s the indicator 
function, and on the right the IF’s of both components are superimposed for the visual comparison. It is 
clear that the slowly oscillatory component is not well captured. In the bottom row, on the left, we show 
the TF representation determined by the empirical mode decomposition with the Hilbert transform, and 
on the right the IF’s of both components are superimposed for the visual inspection. It is clear that the 
fast oscillatory component is not well captured. 













































Figure 4: Top: the noisy signal Y is shown as the gray curve with the clean signal / superimposed as the 
black curve.In the second row, the intensity of the time frequency representation, |i?y| 2 , determined by 
our proposed Tycoon algorithm is shown on the left; on the right hand side, the instantaneous frequencies 
associated with the two components are superimposed on |-R^| 2 as a red curve and a blue curve. 
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Figure 5: Left: the intensity of the short time Fourier transform (STFT) with a Gaussian window with 
the standard deviation cr = 0.4 is shown on the left and the intensity of the synchrosqueezed STFT is 
shown on the right. 
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Appendix A. Proof of Theorem |2.1| 


Suppose 


g{t) = a(t) cos <j>(t) = ( a(t ) + a(t)) cos ((/>(t) + 0(t)) £ Q° 1,C2,C3 . (A.l) 

Clearly we know a £ C' 1 (R), j3 £ (7 3 (]R). By the definition of 1,C2,C3 , we have 

inf aft) > ci, supa(f) < C 2 , (A.2) 

*e R te R 

inf </>'(£) > d, sup0'(t) < c 2 , |^"(£)| < c 3 (A.3) 

*e R t S R 

|a'(i)| < C ^(i), |0"'(i)| < e^(i) (A.4) 


and 


inf [a(t) + a(t)} > ci, sup[a(f) + a(f)] < c 2 , 

*e R teR 

+ £'(£)] > ci, sup[</>'(£) + /?'(£)] < c 2 , |<6"(£) + /?"(£)| < c 3 
( e R teR 

|a'(£) + a'(£)| <e(^(£)+/3'(£)), |<T(t) + /*'"(*) | < e(^'(f) + /3'(i)). 


(A.5) 

(A.6) 

(A.7) 


The proof is divided into two parts. The first part is determining the restrictions on 
the possible [3 and a based on the positivity condition of and a(t), which is inde¬ 
pendent of the conditions (A.4) and (A.7). The second part is to control the amplitude 
of /? and a, which depends on the conditions (A.4) and (A.7). 


First, based on the conditions (A.2), (A.3), (A.5) and (A.6), we show how fj and a 
are restricted. By the monotonicity of 4>{t) based on the condition (A.3), define t m £ M, 
m £ Z, so that 4>{t m ) = (m + 1/2)7r and s m £ R, m £ Z, so that <j>(s m ) = mn. In other 
words, we have 


g(t m ) = 0 and g(s m ) = (-1 ) m a(s m ). 

Thus, for any n £ Z, when t = t n , we have 

(a(t n ) + a(t n )) cos (<f>(t n ) + fi(t n )) 

= ( a(t n ) + a(t n )) cos[mr + ir/2 + /3(t n )} 
= a(t n ) cos(?i7r + 7t/2) = 0, 


(A.8) 


where the second equality comes from (A.l). This leads to j3(t n ) = k n n, k n £ Z, since 
a(t n ) + a(t n ) > 0 by ( |A.6 |. 

Lemma 1. k n are the same for all n £ Z and k n are even. As changing the phase 
function globally by 2ln, where l £ Z, will not change the value of g(t n ) for all n £ Z, we 
could assume that /3(t m ) = 0 for all m £ Z. 

Proof. Suppose there exists t n so that f3(t n ) = kn and /3(f n+ i) = (k + l)n, where kj £ Z 
and l > 0. In other words, we have <j>(t n +i) = 4>{t n ) + (l + 1)7T. By the smoothness of /?, 
we know there exists at least one t' £ (t n ,t ni ) so that +fi(t') = (n + 3/2)n, but this 
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is absurd since it means that (a(t) + a{t)) cos(</(f) + (3{t)) will change sign in (t„, t n+ 1 ) 
while a(t) cos(<j>(t)) will not. 

Suppose k n is a hxed odd integer k, then since /3 £ C 3 (R) and /?(£„) = /3(t„+i) = kit, 
there exists t' £ (£„,£„+ 1 ) so that /3(t r ) = kir and hence 


a{t') cos {4>{t')) = (a(£ 7 ) + ot(t')) cos(</>(£') + /3(t 7 )) = —(a(t') + a(t')) cos 


which is again absurd since cos(<j)(t')) ^ 0 and the amplitudes are positive by (A.2) and 
(A.5). We thus obtain the second claim. □ 

Lemma 2. /3 r (t) is 0 or changes sign inside [t n , t n + 1 ] for all n £ Z. Furthermore, 
|/3 (t') — /3{t")\ < n for any t',t" £ [t m ,t m+ 1 ] for all m £ Z. 

Proof. By the fundamental theorem of calculus and the fact that /?(£„) = /3(f n+ 1 ) = 0, 
we know that 

r t ™+1 


0 = P(t n +i) - /3(£„) = 


/3\u)du. 


which implies the first argument. Also, due to the monotonicity of <f> + /3 (A.6), that is, 
(n + 1/2)7 t = <f>(t n ) + P(t n ) < + j3(t') < <^(£„ + i) + (3{t n+ i) = (n + 3/2)7r for all 

t' £ (t n ,t n + 1 ), we have the second claim 

\m - /3(*")i < tt. 


Indeed, if |/3 (ft') — j3{t")\ > tt, for some t',t" £ [£ n ,£ n+ i] and t' < t", we get an contradic¬ 
tion since </>(£") + f3(t") £ \{n + 1/2)7t, (n + 3/2)7t] while </>(£') + /3(£') £ [(n + 1/2)7r, (n + 
3/2)t r], □ 

Lemma 3. a ^ s a f^( s ) = cos(/3(s„)) /or all n £ Z. In particular, a(s m ) = 0 if and only 
if /3(s m ) = 0, to £ Z. 

Proof. When t = s m , we have 


(—l) m a(s m ) = a(s m ) cos(to7t) 

= (ct(s m ) + ce(s m )) cos[to7t + /3(s m )] 
= (—l) m (a(s m ) + a(s m )) cos(/3(s m )), 


(A.9) 


where the second equality comes from (|A.1|), which leads to a(s m ) > 0 since | cos(/3(s m ))| < 
1. 

Notice that (A.91 implies that /3(s m ) = 2fc m 7r, where £ Z, if and only if a(s m ) = 0. 
Without loss of generality, assume k m > 0. Since /3 £ C 3 (M), there exists t' £ s m ) 

so that /3(£') = n and hence 

a{t') cos {(fit 1 )) = {ait 1 ) + a{t')) cos {(j>{t') + ft (t 1 )) = ~{a{t') + a{t')) cos (</>(£')), 

which is absurd since cos(</(t')) ^ 0 and the positive amplitudes by ( A.2| ) and (A.5). 
Thus we conclude that /3{s m ) = 0. 


To show the last part, note that when a(s m ) > 0, 0 < cos(/3(s m )) = 


a(s„ 


a(s rn )+a(s m ) 


< i 


by (A.9). Thus, we know /3{s m ) £ (— tt/2, tt/2) + 2n m 7r, where n m £ Z. By the same 
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argument as in the above, if n m > 0, there exists t 1 £ (£ m _i,s m ) so that (3{t') = it and 
hence 


a(t') cos(</>(f')) = (a(t r ) + a(t'))cos((j>(t') + f3(t')) = — (a(t r ) + a(t')) cos(</>(f')), 


which is absurd since cos(</>(£')) ^ 0 and the positive amplitudes by (A.2) and (A.5). □ 


Lemma 4. a ^ t °^ f or all n £ 7. In particular, a(t n ) = 0 if and only 

if P'(t n ) = 0, n £ 7. 


Proof. For 0 < x <C 1, we have 


(a(t n + x) + a(t n + x)) cos (cj>(t n + a;) + (3(t n + x)) = a(t n + x) cos (<f>(t n + x)), 


which means that 


a(t n + x) _ cos((f){t n + x) + j3(t n + x)) 
a(t n + x) + a(t n + x ) cos {<j>(t n + x)) 

By the smoothness of </> and (3, as x —> 0, the right hand side becomes 

J. COS (<j)(t n + x) + P(t n + x)) 

®->0 COs{(j)(t n + x)) 

_ J. + a:) + P'(t n + a;) sin (4>(t n + x) + /3(t n + a;)) 

x->-o 4>'{t n + x) sin.(<f(t n + x)) 

_ <t>'(t n ) + f}'{tn) 

&{t n ) 

Thus, since a(t n + x) + a(t n + x) > 0 and a(t n + x) > 0 for all x, we have 

a{tn) _ (t>'{tn) + /3'{t n ) 
a(t n ) a(t n ) cj) (t n ^j 


□ 


Lemma 5. 0"(t) is 0 or changes sign inside [t n , t n + 1 ] for all n £ Z. 

Proof. This is clear since /?'(£) is 0 or changes sign inside [t n , t n+ 1 ] for all n £ 7 by 
Lemma [2j □ 


In summary, while /3(t m ) = 0 for all m £ 7, in general we loss the control of a at t m . 
On s m , a is directly related to /? by Lemma[3] on t m , a is directly related to (3 1 by Lemma 
[4j We could thus call t rn and s m the hinging points associated with the function g. Note 
that the control of a and /3 on the hinging points does not depend on /3"’s condition. 

To finish the second part of the proof, we have to consider the conditions (A.4) and 


(A.7I. 


Lemma 6. |a(t)| < 27re for all t £ R. Further, we have |/3'(t n )| < for all 

n £ 7. 
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Proof. Suppose there exists t' so that a(t') > 27re. The case a{t') < — 27re can be proved 
in the same way. Take m £ Z so that t' £ (t m ,t m+ 1 ]. From ( |A~4| ) and ( |A.7| ) we have 

|a'(t)|<e(20'(t)+/3'(t)). 

Thus, take t £ {t m ,t m+ 1 ). Without loss of generality, we could assume t £ (t m ,t'), we 
have by the fundamental theorem of calculus 

\a(t') — a(t) | < J |a'(u)|du < e[2<j>(lf) - 2<f>(t) + /3(t') - /3(t)] 

< e[(4>(t m+ 1 ) + /3(t m+ 1 ) - - /3{t m )) + {(j){t m+ 1 ) - <f>(t m ))] < 2ne, 

where the last inequality holds due to the fact that <t> + /3 and <p are both monotonic and 
Lemma [2] This fact leads to aft) > 0 for all t £ (t m ,t rn + 1 ]. Since /3(t m ) = 0 for all 
m £ Z, there exists t £ (t m ,t m +i) such that cos(</>(t) + 0{t)) > cos ((j>(t)). However, by 
the assumption and the above derivatives, we know that 

1 > a (i) = cos (cj>(i) + P{t)) 
a(t) + a(t) cos ’ 

which is absurd. Thus, we have obtained the first claim. 

The second claim could be obtained by taking Lemma [4] into account. Indeed, 
since f3'(t n ) = a (7')+l"(t ) a fa) and |a(i)| < 27re, when e is small enough, |/3 , (f n )| < 




47 TCp' (t n ) 


(t„) \ a ttri)\ - a (t„) 


□ 


Thus we obtain the control of the amplitude. Note that the proof does not depend 
on the condition about /3". 

Lemma 7. \P"(t)\ < 2ne, |/? 7 (t)| < and \/3(t)\ < ^ for all t £ R. 

Proof. Suppose there existed t' £ (t m ,t m+ 1 ) for some m £ Z so that |/3"(t')| > 37re. 
Without loss of generality, we assume f)"(t') > 0. From (A.4) and (A.71 we have 

Thus, by the fundamental theorem of calculus, for any t £ {t rn , t r ). we know 
\/3"(t') - j3"(t)\ < J \P"'(u)\du < e ^ (2(f'{t) + (3'(t))du < 2ne, 

where the last inequality holds due to Lemma[2]and the fact that <f>(t') — <f>(t) < <t>(t m + 1 ) — 
<j>{t m ) = 7r and | /3ft 1 ) — (3{t)\ < n from Lemma [2j Similarly, we have that for all t £ 
(t',t m+ 1 ), |/3 ,, (t / ) — P"(t)\ < 2ire. Thus, f3"(t) > 0 for all t £ [t m ,t m+ 1 ], which contradicts 
the fact that (3"(t) must change sign inside [t rn . t m+ \\ by Lemma [5 

With the upper bound of | /3"\, we immediately have for all t £ [t m ,t m + 1 ] that 

I P'{t) - /3'(t m ) | < |/3"(u)|d u < 2n(t - t m )e. 

To bound the right hand side, note that t—t m < t TO+ i— t m < , where t' £ [t m ,t TO +i]. 


Since \j3"{t)\ < 2ne 1 when e is small enough, 


< 


2tt 




Thus, |/3'(f) - < 


2n(t - t m )e < 


47T 


To finish the proof, note that by Lemma 


Similarly, we have the bound for /?. 


3 1/?'(4 


“ a(t n ) 

□ 
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Appendix B. Proof of Theorem |2.2| 


Cl,C2,C3 

e,d ’ 

loss the control of the hinging points for each gIMT like those, t m and s m , in Theorem 


When there are more than one gIMT in a given oscillatory signal / £ Q'fi, "’” J , we 


2.1 


So the proof will be more qualitative. Suppose / = / £ Q C 1 £ J 2 ’ 3 , where 


N 


M 


f(t) = ^aj(i) cos[27r<fo(t)], f(t) ='^2A l (t)cos[2nipi(t)}. 


1 = 1 


1 = 1 


Fix t 0 £ K. Denote f to := /w> fto : = E;=i /w 


yM 


ft 0 ,i(t) := a;(t 0 ) cos 


2tt ( <j)i(to) + - t 0 ) + 0"(t 0 ) 


(t - to) 2 


and 


ft 0 ,i(t) ■■= Ai(to) cos 


2tt (<^i(t 0 ) + <Pi(to)(t - t 0 ) + <Pi(t 0 ) 


(t-tof 


Note that ft 0 ,i is an approximation of ai(t) cos[27T0/(t)] near to based on the assumption of 
2ed C2 ’ C3 • where we approximate the amplitude o; (t) by the zero-th order Taylor expansion 
and the phase function <f>i (t) by the second order Taylor expansion. To simplify the proof, 
we focus on the case that |^>"(t 0 )| > e|$(t 0 )| and |y>"(f 0 )| > e|<^(to)| for all l. For the 
case when there is one or more l so that |^>"(to)| < e\(f> , l (to)\, the proof follows the same 
line while we approximate the phases of these oscillatory components by the first order 
Taylor expansion. 

Recall that the short time Fourier transform (STFT) of a given tempered distribution 
f £ S' associated with a Schwartz function g £ S as the window function is defined as 

Ff (g) (t, g) := [ f(x)g(x — t)e~ i2nr,x dx. 

Jr 


Note that by definition /, f to , ft 0 £ S'. To prove the theorem, we need the following 
lemma about the STFT. 


Lemma 8. For a fixed to £ R, we have 


V, 


i9) (r, V )-V^(r,v) 


= 0(e). 


where C is a universal constant depending on C\, C 2 and d. 


Proof. Fix a time to £ K. By the same argument as that in mm and the conditions of 
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Q^f 2 ’ 03 , we immediately have 


I V { f s) {T,ri) - V^(t,ii) | 

[ m-f t 0 (t))g(t-T)e- i2 ^dt 
JR 


< 



ai(t 0 )) cos[27 T(t)i{t)]g{t - t 0 )e l27vr,t dt 


N 

E 

z=i 


= 0 ( 0 , 


0,1 {to) ( cos[27n^(t)] - cos[27r(^(f 0 ) + 0 / (£ o )(* — *o) 


+ ^'{to){t - t 0 ) 2 )])g{t - t 0 )e- i2 ^dt 


where the last term depends only on the first few absolute moments of g and g'. d, c\ 
and C 2 . □ 

With this claim, we know in particular that Vf 9 \to,rj) = Vf 9 \t 0 ,g) + O(e). As a 

J J t o 

result, the spectrogram of / and f to are related by 

\v} 3 \t 0 ,v)\ 2 = \V^{to,v)\ 2 +0(e). 

Next, recall that the spectrogram of a signal is intimately related to the Wigner-Ville 
distribution in the following way 


\ v ${t,v)\* = J J WV ft0 (x,QWV g {x-T,t-r,)dxd£, 

where the Wigner-Ville distribution of a function h in the suitable space is defined as 

WV h {x,C) := J h(x + r/2)h* (x — r/ 2 )e _i 27 rT ^dr. 


Lemma 9. 

described in 


Take g{t) = (2U ) 1 / 4 exp ncrt 2 }, where a = C 3 . 
\B.S), we have 


When d is large enough 


\v} 9 \t 0 ,v)\ 2 = L{t 0 ,rj)+ e and \V ( ~ g) (t 0 , r ?)| 2 = L{t 0 , g) + e, 


where 



N 

L{t 0 ,v) ■= E a ?( io )\ 

1=1 V 

M 

1 a 

f 27rcr(^(t 0 )-r;) 2 \ 

and 

/ 2( CT 2 + $'(t 0 )2) 6XP < 

l cr 2 + 0"(t o ) 2 J 


L{t 0 ,v) ■■= E^Ooh, 
1=1 v 

/ J 

27rcr(<^(f 0 ) - r ?) 2 } 


2(cr 2 + <^"(t 0 ) 2 ) 6XP j 

a 2 + ip'/ (t 0 ) 2 J 
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Proof. By a direct calculation, the Wigner-Ville distribution of the Gaussian function 
g(t) = (2a) 1 / 4 exp {— 7 rut 2 } with the unit energy, where a > 0, is 

—27t ^ax 2 + 

similarly, the Wigner-Ville distribution of ft 0 j is 

WV ftoJ (x,0 = a?(to)^'(to)+^'(to)(ic-to)(0- 

Thus, we know 


WVg( x,€) = 2 exp 





(B.l) 


f J WV fto ,(x,£)WV g (x - to,£ - v)dxdf, 

J J (af(to)^(t 0 ) +? i"(t 0 )(x-t 0 )(0) 2exp|-27T (V(x - t 0 ) 2 + ^ ^ ^ | d£da; 


=2af(t 0 ) J exp|-27r ^a(x - t 0 ) 2 + 
=a 2 (t 0 ) 


2 , ($(to) + </>"(to)(2:-to)-»?) 5 


da- 


2(cr 2 + </>"(to) 2 ) 


exp < - 


2tt(j 


W(t 0 y 


mt 0 )-r,y 


Thus, we have the expansion of YI1L1 I Vf 3 ^ (to, ??)| 2 , which is L(to,r)). Next, we clearly 

JtQ ,1 

have 


N 

b:v,7>i 2 )p 

= 


Z=1 


k^l 


k^l 


O 2 + o ) 2 


To bound the right hand side, note that (B.l) implies 

As a result, V ff 0tl (to, v) V/ 3 ] k (to, v) 

ak(t 0 )ai(t 0 )a 1/2 


becomes 


exp < —7r a 


Y _ 

^(4(u 2 + ^(t 0 ) 2 )(a 2 + ^(to) 2 )) 1/4 

which is a smooth and bounded function of ?y and is bounded by 


mto)-v? + m 0 )~ v) 2 


a 2 + V T y l 


<T 2 + 4>” (t 0 ) 5 


r 2 

c 2 


V^cr 1 / 2 


^expi- 


k^l 


KG 


cr 2 + Ct 


[(^fc(to)-??) 2 + (</>!(to)-??) 2 


(B.2) 
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Suppose the maximum of the right hand side is achieved when 77 
ko = argmin; =1) .. i jv-i(0;_i_i(4o) — o))- To bound (B.2), denote 7 = 

the notation. Clearly, the summation in (B.2) is thus bounded by 


= 0 fc o Oo), where 
to simplify 


OO OO OO 

2 ^2 e - ' 27 ^ + 2 J2 e ~ k2ld2 ^ e "^ 2 < 2 (Q + 1)5, 

Z=1 fc=l 1=0 


where Q = / 0 °° e l2 " /d ~ t dt = and S = f2 e 1 2 ' yd ? t <it < • Note that here 

we take the bounds X^o < Q and e - ^ 7 ^ < 5. Thus, we conclude that 

\/2c 2 

the interference term is bounded by t/ | (Q + 1)5. To finish the proof, we require the 
interference term to be bounded by e, which leads to the following bound of d when we 
take cr = C 3 : 


d> \/2 In C 2 + - In C 3 — In e. 


We have finished the proof. 


(B.3) 


□ 


Since to is arbitrary in the above argument and the spectrogram of a function is 


unique, we have \Vf 9 \t 0 ,^)\ 2 = |V~ 9 ,l (fo,? 7)| 2 and hence 

I L{t 0 ,ri) - L(t 0 ,r])\ = 0(e). 
With the above claim, we now show M = N. 

Lemma 10. M = N. 


(9)/ 


(B.4) 


Proof. With <7 = C3, by Lemma |9j for each l = 1 ,...,N, there exists a subinterval 
/;(to) around 0((to) so that on Ii(t 0 ), L(t 0 ,r]) > 


a l Oo) c 3 


> 


= . Similarly, 


2 v / 2(c|+0"(t o ) 2 ) 1yjlc%+1c\ 

for each 1 = 1,..., M, there exists a subinterval J/(to) around <^(^ 0 ) so that on J;(t 0 ), 
L(t 0 ,ri) > — 7 =!==? > ==• Thus, when e is small enough, in particular, 

2 V 2 (°3+^I (*<>) ) 2V2c|+2c 2 


e < 


2y / 2c|+2c; 


the equality in 


B.4 1 cannot hold if M 7 ^ TV. 


□ 


With this claim, we obtain the first part of the proof, and hence the equality 

N N 

fit) = y^aijt) cos[2t T^ijt)] = y^Aijt) cos[27ry?/(t)] G Q^/ 2,C3 . (B.5) 

z=i 1=1 

Now we proceed to finish the proof. Note that it is also clear that the sets I;(to) and 


Ji (to) defined in the proof of Lemma 10 satisfy that Ii(t 0 ) fl Ifc(fo) = 0 for all l ^ k. 
Also, Ii(to) fl Ji (to) / 0 and Ii(to) D Jk(to) = 0 for all l ^ k. Indeed, if k = l + 1 and 
we have /;(f 0 ) n Ji+i(t 0 ) ^ 0, then L(t 0 ,i 7 ) > , ClC 2 3 y on Ji+i(t 0 )\Ii(t 0 ), which leads 

lyj ZC3+ZC2 

to the contradiction. By the ordering of <^(to) and hence the ordering of Ii(to), we have 
the result. 
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Take £ = 1 and r/ = <p\ (t 0 ). By Lemma|9] when d is large enough, on Ji(t 0 ) we have 


ii(t o) 


M(h) 


V'MPj 02(1 + 0/(t O ) 2 ) 

which leads to the fact that 

a l(to)4 


exp 


2 ?r 03(^1 (t 0 ) - 0 i(t o )) 2 

c§ + 0i(*o) 2 


^i(*o)c 


= O(e). 


V 2 ( c 3 + </>l(to) 2 ) 02(0 + 0l(to) 2 ) 

Indeed, without loss of generality, assume ai(«o)c 3 > 

6 j V / 2(c|+0' , (t o ) 2 ) - 02 

a 2 (t 0 )c| Aj(t 0 )4 


/2(cl+ V '/(t 0 ) 2 ) 


0(e), (B.6) 

(B.7) 
and we have 


< 


0 2 (0 + </>l( t o) 2 ) \/ 2 ( c 3 + 01 (to) 2 ) 

a 2 (t 0 )ct 


3 r v^i ; v V 3 

u 3 ^l(t 0 )c 


02 (c§ + 0/(to) 2 ) 02(0 + (///(to) 2 ) 


exp 


27rc 3 (^ , 1 (t 0 ) - </>/(t 0 )) 2 

0 + 0/(t o ) 2 


= 0(e) 


by (B.6) since 0 is the unique maximal point of the chosen Gaussian function. 
Lemma 11. |0(t) — 0(t) | = O(0e) for all time fgK and £ = 1,..., N. 


Proof. Fix t 0 £ R and £ = 1. By (B.6), (B.7) and the conditions of C00 2 ’ C3 , on 7i(t 0 ) 
we have 


A?(t„)c 


02(0 + 0/(t O ) 2 ) 


1 — exp < — 


2tt c 3 (0i(t o ) - (t 0 )) ; 

c| + 0/(t O ) 2 


= 0(e). 


Due to the fact that the Gaussian function monotonically decreases as 2,rC3 ^d t o) > 

c 3' ( Pl K^O) 

0, we have 

K(t 0 )-^(t 0 )) 2 = 

.3 ‘-I 


4 + Pi (to) 2 


Since 0/ is uniformly bounded by C 2 , we know 

I0i(to) - 0i(*o)| = 0(00 

By the same argument, we know that |0(t) — 0(t)| = O(0e) for all l = 1,..., N and 
teR. □ 

Lemma 12. |</>"(t) — p"(t) | = O(0e) for all time t £ R and £ = 1,..., N. 

Proof. Fix t 0 £ R and £ = 1. By the assumption that ft"(to) = O(e) and (//"(to) = 0(e), 
we claim that \ft{(to) — 0/(to)| = O(0e) holds. Indeed, we have 


ptQ-ti- pt oi -1 

01 (to + 1) = 01 (to) + / </>i(s)ds and 0i(to + 1) = 0i(to) + / ip'{(s)ds, 

J to Jt 0 

which leads to the relationship 


0i(t 0 + 1) ~ 0i (to + 1) — 0i (to) — 0i (to) + 
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r^o+l 


(0i( s ) — 0i( s ))ds. 


to 





































Therefore, by the assumption that (f'”{t 0 ) = O(e) and <p"'(t o) = O(e), we have 


(<%{s)-tf(s)) ds 


'to 




4>"{to) - p" (to) + W( x ) - Pi'{x)) ds 
= < t>"(to) ~ Pi (to) + O(e), 

which means that |^"(t 0 ) — Pi(to)\ = 0( v / e) since \<j>{(to + 1) — (p[(to + 1)| = 0(y/e) and 
IftAto) — p'i(to)\ = 0(y/e). By the same argument, we know that \<p"[t) — <t>"{t)\ = 0{y/e) 
for all Z = 1,..., N and i€t. □ 

Lemma 13. \ae(t) — Ag(t) \ = 0(y/e) for all time t G R and t = 1,... ,N. 


Proof. Fix t 0 S K and 1 = 1. From (B.7), it is clear that |ai(fo) — Ai(t 0 )| = Q(y / e) if 
and only if \<f)’{ (to) — Pi{to)\ = 0(y/e), so we obtain the claim by Lemma 
argument holds for all time tgR and £ = 2,..., N. 


12 


Similar 

□ 


Lastly, we show the difference of the phase functions. 

Lemma 14. \<f>g(t) — pg(t) \ = 0(y/e) for all time tgt and 1=1, 


,N. 


Proof. By (B.5) and the fact that |a;(t) — Ai(t)\ = 0(y/e), we have for all t G 


N 


N 


YMt) cos[2n(j>i(t)] = y^Q;(t) cos[27r(^;(t) + Cp(t))] + 0 (y/e 


;=i 


where cq G C 3 (M). Note that YiLi a i{t) cos[2n(cj>i(t) +a;(t))] G Qf^ff 2 ' 03 . Fix t 0 G M. 
Suppose there exists to and the smallest number k so that a*(to) = 0{y/e) up to multiples 
of 27 t does not hold. Then there exists at least one £ ^ k so that ag(t 0 ) = 0(y/t) does not 
hold. Suppose L > k is the largest integer that oil (to) = 0(y/e) does not hold. In this 
case, there exists t\ > to so that YiLi a/(ti) cos[27r</>z(ii)] = YliLi °z(ti) cos[27r(^(ti) + 
a/(ti))] + 0(y/e) does not hold. Indeed, as <j>' L (to) is higher than <j)' k {to) by at least d, 
we could find t\ = (f> k (cfkito) + c), where 0 < c < 7r, so that cos[27r(^L(fi) + az,(fi))] — 
cos[27r(0 i (t 1 ))] = cos[27r(0i(t o )+ai(t 0 ))] — cos[27r(0x,(to))] + 0{y/l) does not hold while 
a-z(t) c°s[27r^(t)] = ai(t) cos[2ir(<pi(t) + ai(t))\ + 0(y/e) holds. We thus get a 
contradiction and hence the proof. □ 


Appendix C. A convergence study of Algorithm [2] 

We provide here a simple convergence study of Algorithm [2j based on the Zangwill’s 
global convergence theorem [53] which can be stated as follow 

Theorem Appendix C.l. Let A be an algorithm on X, and suppose that, given 
xq G X, the sequence {xkY =1 is generated and satisfies 

x k+ i G A (x k ) ■ 

Let a solution set T be given, and suppose that 
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(i) the sequence {xk}^L 0 C S for S C X a compact set. 

(ii) there is a continuous function Z on X such that 

(a) if x r, then Z(y) < Z(x ) for all y £ A(x) 

(b) if x G then Z{y) < Z(x) for all y £ A(x) 

(Hi) the mapping A is closed at all point X T 

Then the limit of any convergent subsequence of {xk}'j?_ 0 is a solution, and Zf —>■ Z{x*) 
for some x* £ T. 

The algorithm A is here the alternating minimization Alg. [2] The solution set T is 
then naturally the set of the critical points of the functional TL. Equivalenly, T is the 
set of the fixed point of Alg. [2j Indeed, for a being fixed, F* is a minimizer of TL a is 
equivalent to F* is a fixed point of Alg. [I] [14]. The descent function Z is then naturraly 
the functional TL. 

As we work in the finite dimensional case, the boundeness of the sequence and then 
point (i) of the Theorem is here direct consequences of point (iii). Moreover, thanks to 
the continuity of the soft-thresholding operator, the mapping A is continuous and point 

(iii) is also a direct consequence of point (iii). 

Point (iii) of the theorem comes from the monotonic version of FISTA. Indeed, 
the very first iteration of FISTA is equivalent to a ’’simple” forward-backward step, 
which ensure the strict decreasing of the functional TLa (see [48| 1. Then, as a is the 
unique minimizer of the function Tip , we have a sequence {(a*,, F k)}^L 0 such that 

H ^a fc+1 , F k+1 ^j < TL (a k ,F k ^J as soon as ^a k ,F / 'J is not a critical point of TL. 
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